{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "cf723ed9-d8e0-4f1a-911f-887b927f8569",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0] # 哆啦A梦与超级赛亚人：时空之战\n",
      "\n",
      "[1] 在一个寻常的午后，大雄依旧坐在书桌前发呆，作业堆得像山，连第一页都没动。哆啦A梦在一旁翻着漫画，时不时叹口气，觉得这孩子还是一如既往的不靠谱。正当他们的生活照常进行时，一道强光突然从天而降，整个房间震动不已。光芒中走出一名金发少年，身披战甲、气势惊人，他就是来自未来的超级赛亚人——特兰克斯。他一出现便说出了惊人的话：未来的地球即将被黑暗势力摧毁，他来此是为了寻求哆啦A梦的帮助。\n",
      "\n",
      "[2] 哆啦A梦与大雄听后大惊，但也从特兰克斯坚定的眼神中读出了不容拒绝的决心。特兰克斯解释说，未来的敌人并非普通反派，而是一个名叫“黑暗赛亚人”的存在，他由邪恶科学家复制了贝吉塔的基因并加以改造，实力超乎想象。这个敌人不仅拥有赛亚人战斗力，还能操纵扭曲的时间能量，几乎无人可敌。特兰克斯已经独自战斗多年，但每一次都以惨败告终。他说：“科技，是我那个时代唯一缺失的武器，而你们，正好拥有它。”\n",
      "\n",
      "[3] 于是，哆啦A梦带着特兰克斯与大雄启动时光机，穿越到了那个即将崩溃的未来世界。眼前的景象令人震撼：城市沦为废墟，大地裂痕纵横，天空中浮动着压抑的黑雾。特兰克斯说，这正是黑暗赛亚人带来的结果，一切生命几乎都被抹杀，只剩他在苦苦支撑。大雄虽感到恐惧，但看到无辜的人类遭殃，内心逐渐燃起斗志。哆啦A梦则冷静地分析局势，决定使用他最强的三样秘密道具来对抗黑暗势力。\n",
      "\n",
      "[4] 三件秘密道具分别是：可以临时赋予超级战力的“复制斗篷”，能暂停时间五秒的“时间停止手表”，以及可在一分钟中完成一年修行的“精神与时光屋便携版”。大雄被推进精神屋内，在其中接受密集的训练，虽然只有几分钟现实时间，他却经历了整整一年的苦修。刚开始他依旧软弱，想放弃、想逃跑，但当他想起静香、父母，还有哆啦A梦那坚定的眼神时，他终于咬牙坚持了下来。出来之后，他的身体与精神都焕然一新，眼神中多了一份成熟与自信。\n",
      "\n",
      "[5] 最终战在黑暗赛亚人的空中要塞前爆发，特兰克斯率先出击，释放全力与敌人正面对决。哆啦A梦则用任意门和道具支援，从各个方向制造混乱，尽量压制敌人的时空能力。但黑暗赛亚人太过强大，仅凭特兰克斯一人根本无法压制，更别说击败。就在特兰克斯即将被击倒之际，大雄披上复制斗篷、冲破恐惧从高空跃下。他的拳头燃烧着金色光焰，目标直指敌人心脏。\n",
      "\n",
      "[6] 时间停止装置在关键时刻启动，世界陷入静止，大雄用这个短短五秒接近了敌人的盲点。他集中全力，一记重拳击穿了黑暗赛亚人的能量核心，引发巨大的能量反冲。黑暗赛亚人尖叫着化为碎光，天空中的黑雾瞬间散去，阳光重新洒落大地。特兰克斯倒在地上，看着眼前这个曾经懦弱的少年，露出了欣慰的笑容。他知道，这一次，是大雄救了世界。\n",
      "\n",
      "[7] 战后，未来世界开始恢复，植物重新生长，人类重建家园。特兰克斯告别时紧紧握住大雄的手，说：“你是我见过最特别的战士。”哆啦A梦也为大雄感到骄傲，说他终于真正成长了一次。三人站在山丘上，看着远方重新明亮的地平线，心中感受到从未有过的安宁。随后，哆啦A梦与大雄乘坐时光机返回了属于他们的那个年代，一切仿佛又恢复平静。\n",
      "\n",
      "[8] 回到现代后，大雄仿佛变了一个人，不再轻易抱怨、不再逃避责任。他认真写完作业，帮妈妈买菜，甚至主动练习体育，哆啦A梦惊讶得说不出话来。他知道，这不是一时兴起，而是大雄真正内心成长的结果。大雄有时会望着天空出神，仿佛还能看见未来世界的那一片废墟与重生的希望。他不会说出来，但他心中永远铭记那一战。\n",
      "\n",
      "[9] 几天后，电视新闻中突然出现一则画面：一位金发少年在街头击退了失控的机器人，引发市民围观与猜测。大雄放下手中的课本，望向哆啦A梦，两人心照不宣地笑了。也许，特兰克斯又回来了，也许，新的敌人正在逼近。冒险从未真正结束，而他们，早已准备好了。无论时空如何动荡，他们将永远并肩作战。\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from typing import List\n",
    "\n",
    "def split_into_chunks(doc_file: str) -> List[str]:\n",
    "    with open(doc_file, 'r') as file:\n",
    "        content = file.read()\n",
    "\n",
    "    return [chunk for chunk in content.split(\"\\n\\n\")]\n",
    "\n",
    "chunks = split_into_chunks(\"doc.md\")\n",
    "\n",
    "for i, chunk in enumerate(chunks):\n",
    "    print(f\"[{i}] {chunk}\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "cfe9bf60-5d21-4696-99a5-7e7f3b94dd06",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "768\n",
      "[0.02680548094213009, 0.008382038213312626, 0.0003433567762840539, 0.007299014367163181, 0.05433317646384239, -0.05325588956475258, 0.0013655528891831636, -0.0013182297116145492, -0.03671124950051308, 0.07188178598880768, -0.0072706579230725765, -0.007053006440401077, 0.04253285378217697, -0.03675279766321182, -0.054750557988882065, -0.009598560631275177, 0.01710553467273712, 0.05915359780192375, -0.03335001692175865, 0.062376558780670166, -0.00488852895796299, -0.034539543092250824, -0.07407604902982712, 0.044221922755241394, 0.010516896843910217, -0.037077806890010834, -0.027029866352677345, 0.03830365464091301, 0.02128247544169426, -0.011811469681560993, -0.005408741533756256, 0.002659012097865343, -0.02329857461154461, 0.05299091339111328, 0.0051494501531124115, 0.029624179005622864, -0.030809661373496056, -0.01785610057413578, 0.042446039617061615, -0.007692299783229828, -0.010638092644512653, 0.03210865706205368, -0.06592466682195663, -0.01210092008113861, 0.0068145873956382275, -0.0011549683986231685, -0.020827766507864, 0.027529576793313026, -0.045469965785741806, 0.051770180463790894, -0.051474228501319885, 0.256486177444458, 0.05031630024313927, 0.017564307898283005, -0.011565234512090683, -0.013389555737376213, 0.016082623973488808, 0.01845124550163746, -0.006780578289180994, 0.031050030142068863, 0.05057542398571968, -0.012529821135103703, 0.06348630785942078, -0.048402804881334305, 0.0009180836495943367, 0.021004773676395416, -0.010703865438699722, -0.06085040792822838, 0.006804729346185923, 0.05067179724574089, -0.024943431839346886, -0.012202027253806591, 0.018374765291810036, 0.03233983740210533, 0.042311057448387146, 0.00020395567116793245, -0.017155051231384277, -0.016037961468100548, -0.03760528191924095, -0.0436251163482666, -0.041648779064416885, 0.016573484987020493, 0.009964797645807266, -0.06010220944881439, -0.027282942086458206, 0.020031124353408813, 0.010414332151412964, -0.04751446098089218, -0.018612153828144073, -0.011965063400566578, -0.06262180954217911, -0.04016494378447533, -0.00942967925220728, 0.019113536924123764, 0.00968236941844225, 0.0019715186208486557, 0.044232532382011414, -0.003862484823912382, 0.00029376932070590556, 0.0358089916408062, -0.03436819091439247, -0.0068693384528160095, -0.029506126418709755, 0.002121210331097245, -0.016589511185884476, -0.014159927144646645, -0.018725136294960976, 1.735198384267278e-05, 0.0009454188984818757, 0.04507480934262276, 0.006473740562796593, 0.03398652374744415, -0.0045067546889185905, 0.049304503947496414, -0.01237794291228056, 0.028520235791802406, 0.028152696788311005, -0.01200655847787857, 0.01223220955580473, 0.003402916481718421, -0.015172971412539482, -0.022247157990932465, -0.014169584028422832, 0.023108789697289467, 0.009890854358673096, 0.04284040629863739, -0.030761033296585083, -0.03398945555090904, 0.018358435481786728, 0.038009535521268845, 0.02349196746945381, 0.021494237706065178, 0.0005228004301898181, -0.09578021615743637, 0.020698755979537964, -0.01241658627986908, 9.56698422669433e-05, -0.004413336515426636, -0.06176118925213814, -0.0637013241648674, 0.042515918612480164, -0.019275754690170288, -0.021949520334601402, 0.009037390351295471, 0.015709180384874344, 0.03664668649435043, -0.011252578347921371, 0.030907971784472466, -0.007981755770742893, -0.020503826439380646, 0.018466180190443993, 0.03974352031946182, -0.025002002716064453, 0.050263673067092896, 0.007021953351795673, 0.008837774395942688, 0.05353927239775658, -0.022880304604768753, -0.06497209519147873, 0.045547958463430405, -0.018929122015833855, -0.00956945400685072, 0.01566329039633274, -0.01911521889269352, -0.03303702175617218, 0.00913039781153202, -0.01480466965585947, -0.0005449760355986655, 0.03782360255718231, -0.024828270077705383, -0.020784934982657433, 0.06556414067745209, 0.05158452317118645, 0.027920948341488838, 0.021408185362815857, 0.018681302666664124, -0.006588047370314598, 0.04769236594438553, 0.011029955931007862, 0.02286238595843315, -0.1066974475979805, 0.017980778589844704, 0.060067132115364075, -0.020247286185622215, -0.013238555751740932, -0.003757177386432886, -0.03314714878797531, 0.03496038168668747, 0.002900641178712249, 0.00540244160220027, -0.011998503468930721, -0.0029790785629302263, -0.019136987626552582, -0.019632592797279358, 0.04240811988711357, -0.009020973928272724, -0.007898873649537563, -0.021550869569182396, -0.03288109973073006, -0.03879507631063461, -0.004414174240082502, -0.01377736683934927, 0.0023896759375929832, 0.008905782364308834, 0.01703587733209133, 0.018444158136844635, -0.0013177728978917003, -0.039486613124608994, -0.04127557948231697, -0.011266766116023064, 0.020374279469251633, -0.03370969370007515, 0.00020458259677980095, -0.009100593626499176, 0.025252513587474823, 0.035498522222042084, 0.03128746524453163, 0.014305442571640015, -0.06018894538283348, -0.014268027618527412, -0.025481710210442543, -0.020244726911187172, -0.026308294385671616, 0.031466081738471985, -0.0014750557020306587, 0.0024535218253731728, -0.003878351068124175, -0.0765811949968338, 0.034462977200746536, -0.02106558531522751, -0.026005137711763382, -0.03663505241274834, -0.04224966838955879, 0.04329920560121536, 0.01358682569116354, 0.018750885501503944, 0.025812407955527306, 0.03914476931095123, -0.03560348227620125, -0.039731938391923904, -0.01938370056450367, 0.03881314396858215, -0.028673522174358368, -0.04575642943382263, -0.030043575912714005, -0.005910517647862434, 0.030279846861958504, -0.007141618523746729, 0.01975637674331665, -0.023295726627111435, -0.033641066402196884, 0.028904961422085762, -0.02478053979575634, 0.0008984373416751623, 0.06980863213539124, 0.020071830600500107, 0.02953665889799595, 0.011422575451433659, 0.043476223945617676, -0.06445600837469101, -0.01440176647156477, 0.014391297474503517, 0.056192584335803986, -0.024886632338166237, -0.04148560389876366, 0.037213318049907684, -0.06418780237436295, -0.004165202844887972, 0.0003651363658718765, -0.029715651646256447, -0.019475232809782028, 0.029696771875023842, -0.012616567313671112, -0.0030558141879737377, -0.002162749180570245, -0.07429531216621399, 0.012262551113963127, -0.00017990762717090547, 0.012211722321808338, -0.025008447468280792, -0.018118904903531075, -0.05154562368988991, 0.012818492949008942, -0.041998740285634995, -0.011001654900610447, -0.03452014923095703, -0.04893345758318901, 0.026315098628401756, 0.043589070439338684, -0.025372901931405067, 0.013372876681387424, -0.015835825353860855, -0.018673522397875786, 0.0004607595910783857, 0.14620473980903625, -0.022349515929818153, 0.002066087443381548, 0.020974675193428993, 0.05730258300900459, 0.05779650807380676, -0.007015353534370661, -0.05949036404490471, -0.0562482513487339, 0.07196559011936188, 0.008784073404967785, 0.03127358481287956, -0.0328294113278389, -0.031061779707670212, -0.038107626140117645, -0.08086157590150833, -0.02217821218073368, 0.01967569626867771, 0.06560678035020828, -0.01361657865345478, -0.0432109534740448, 3.212993033230305e-05, 0.007617776747792959, 0.059684447944164276, 0.011839399114251137, -0.007775966543704271, -0.02267345041036606, 0.06027139723300934, 0.030707618221640587, 0.1009594276547432, -0.018273744732141495, 0.005221743602305651, -0.018574150279164314, 0.046662501990795135, -0.039840999990701675, 0.06332744657993317, 0.017829323187470436, -0.016721289604902267, 0.05946602299809456, 0.05359222739934921, 0.033750373870134354, 0.01619795896112919, -0.02781626768410206, -0.04610617086291313, 0.028852568939328194, 0.007923398166894913, -0.003821414662525058, -0.02448606863617897, 0.01644233427941799, -0.0143220080062747, -0.01953458972275257, -0.022928403690457344, 0.0008579015848226845, -0.05842091515660286, -0.022136928513646126, 0.008564661256968975, 0.003413505619391799, 0.06435392796993256, -0.028461040928959846, -0.01898174360394478, -0.003873726585879922, -0.015333231538534164, 0.033341072499752045, 0.0047195772640407085, 0.05163421481847763, 0.012878579087555408, 0.01577749475836754, -0.017106937244534492, -0.026351606473326683, -0.002479263348504901, 0.023901542648673058, 0.02081417478621006, -0.007952293381094933, 0.029625149443745613, -0.09627807885408401, 0.04161232337355614, 0.0148893678560853, 0.0345672145485878, 0.021369902417063713, -0.023457953706383705, -0.0010088582057505846, 0.021206457167863846, 0.02020874246954918, 0.05265496298670769, -0.014558154158294201, -0.007277476601302624, -0.020998140797019005, -0.013605271466076374, 0.0324535071849823, -0.05961483344435692, -0.03337843343615532, 0.020222201943397522, 0.012565594166517258, -0.03519878163933754, -0.007077857386320829, -0.028356319293379784, -0.08278723061084747, 0.013192405924201012, 0.011490912176668644, -0.01027492992579937, 0.11089946329593658, 0.007383037358522415, -0.02479495294392109, 0.07341889292001724, -0.03335028886795044, -0.02382080629467964, -0.0029024751856923103, 0.0020614792592823505, -0.00572629552334547, 0.024756118655204773, 0.05605889856815338, -0.1111818253993988, -0.021946720778942108, -0.016140097752213478, 0.04339050129055977, 0.0037102752830833197, -0.03505484759807587, 0.03899580240249634, 0.011435077525675297, 0.02022169902920723, -0.026690296828746796, 0.0048322430811822414, -0.015849657356739044, -0.053173620253801346, 0.08264001458883286, -0.02741048112511635, 0.003806737717241049, 0.021086202934384346, 0.011895634233951569, 0.004174356814473867, -0.010561533272266388, -0.04180792719125748, -0.03417472541332245, -0.04522692412137985, 0.010197128169238567, -0.030837610363960266, -0.004010607488453388, -0.06798148900270462, -0.011550945229828358, 0.0079416548833251, -0.0156096825376153, 0.0025790692307054996, -0.015110064297914505, -0.00895465537905693, 0.020070435479283333, -0.03537009656429291, -0.056165773421525955, -0.002300518797710538, 0.024881284683942795, -0.0084794657304883, 0.03194727376103401, 0.048944197595119476, 0.021891988813877106, -0.035894449800252914, 0.03244776651263237, -0.0005915041547268629, 0.004301982466131449, 0.04572706297039986, -0.048882391303777695, -0.059860728681087494, 0.06363465636968613, -0.024566177278757095, -0.007733828853815794, -0.0016328325727954507, 0.0020884776022285223, -0.041061826050281525, 0.06061314791440964, -0.021700816228985786, -0.06142507493495941, 0.028309106826782227, 0.04443180561065674, -0.020188920199871063, -0.0032107275910675526, -0.0063331592828035355, 0.05330388620495796, 0.03841032460331917, 0.023930739611387253, 0.07729265838861465, -0.007035878021270037, 0.010095823556184769, 0.0035082006361335516, -0.04364694282412529, 0.019585996866226196, -0.027304761111736298, -0.03865275904536247, -0.008418368175625801, 0.016983961686491966, -0.08224362134933472, -0.003916573245078325, -0.03603890910744667, -0.00291751092299819, -0.01797185279428959, -0.01928059756755829, 0.030863601714372635, 0.04893181845545769, -0.009585103020071983, -0.08360496163368225, -0.0225935447961092, -0.012387729249894619, -0.011543945409357548, -0.037867266684770584, -0.06551000475883484, 0.035191841423511505, 0.041023898869752884, -0.08397973328828812, -0.017963120713829994, 0.006989809684455395, -0.04847666621208191, 0.015127982944250107, -0.04108656942844391, -0.012682516127824783, -0.006762446369975805, -0.08201920241117477, -0.021286631003022194, 0.015313131734728813, 0.07352934777736664, -0.03893899917602539, -0.015363158658146858, 0.00020186383335385472, 0.032448332756757736, -0.02580149471759796, 0.012067311443388462, 0.023155804723501205, 0.057409729808568954, 0.03198215737938881, 0.001448770402930677, -0.007571099791675806, 0.001817332929931581, -0.014613229781389236, -0.02007727324962616, -0.01991691067814827, 0.022061645984649658, -0.020931951701641083, -0.007911404594779015, -0.015219900757074356, 0.04887213557958603, -0.02920396253466606, -0.01738613285124302, 0.0005062657292000949, 0.01998251862823963, 0.03453022614121437, 0.036018405109643936, -0.022892339155077934, -0.03786933794617653, 0.012512452900409698, -0.022280896082520485, -0.062353167682886124, 0.03337495028972626, 0.024999108165502548, -0.0029817549511790276, -0.031268589198589325, 0.044541239738464355, -0.02072739228606224, -0.050438929349184036, 0.02719254419207573, 0.004596244543790817, 0.011618748307228088, -0.01557834167033434, -0.010279979556798935, 0.03370494395494461, 0.013912439346313477, -0.048488665372133255, -0.02585037238895893, 0.015299174934625626, 0.029496680945158005, 0.008933359757065773, 0.02277173474431038, 0.05251256003975868, -0.05872463062405586, -0.016355587169528008, 0.004389138426631689, -0.00495615741237998, -0.007998046465218067, -0.017453569918870926, -0.05093001201748848, -0.0351133830845356, -0.041876066476106644, -0.029814543202519417, -0.03136188164353371, 0.016929319128394127, 0.03695201873779297, 0.016865508630871773, 0.014302444644272327, 0.00778863113373518, 0.01614721119403839, 0.0008381388615816832, 0.03724338114261627, 0.028707211837172508, 0.03142763674259186, 0.021611139178276062, 0.021806832402944565, 0.047581594437360764, -0.027289921417832375, -0.022526947781443596, 0.022611716762185097, 0.020773418247699738, -0.04105996713042259, -0.007196053396910429, -0.03604191541671753, -0.019560595974326134, 0.037935834378004074, 0.033026695251464844, -0.005511813331395388, 0.021567588672041893, -0.033023592084646225, 0.030616018921136856, -0.013104697689414024, -0.021958576515316963, 0.0036418866366147995, 0.029302649199962616, 0.037027787417173386, -0.02399144321680069, -0.013686261139810085, 0.0018302856478840113, 0.01949324831366539, -0.030949236825108528, -0.021212492138147354, -0.016473693773150444, 0.011225881986320019, 0.030636651441454887, -0.006342783570289612, 0.02723691612482071, -0.02128232643008232, -0.014434587210416794, -0.01231427676975727, -0.00016167352441698313, -0.0018798435339704156, 0.05209754407405853, -0.035325922071933746, -0.06617750972509384, 0.032595399767160416, -0.0140606090426445, 0.061001088470220566, 0.07049667835235596, -0.006168829742819071, 0.005293157417327166, -0.0500921756029129, -0.03336144611239433, -0.015566729940474033, 0.029270630329847336, -0.013397508300840855, -0.03303978592157364, 0.0014945340808480978, 0.007068088743835688, -0.01438178587704897, 0.02100301906466484, 0.04071950167417526, -0.038102887570858, 0.03645922616124153, -0.016191011294722557, 0.009750722907483578, 0.043142158538103104, 0.035731472074985504, 0.035325128585100174, -0.016262007877230644, -0.019360443577170372, -0.013566973619163036, -0.024645064026117325, 0.05361154302954674, -0.008533909916877747, 0.049023713916540146, 0.024370431900024414, 0.043964188545942307, -0.014774036593735218, -0.01016101986169815, -0.053560011088848114, 0.00633039278909564, -0.04033401980996132, 0.022206826135516167, -0.0013751634396612644, -0.039880502969026566, -0.06964974850416183, 0.00015011655341368169, -0.0005040596006438136, 0.06797170639038086, 0.038351405411958694, 0.07828045636415482, 0.022008784115314484, 0.0053893690928816795, 0.017445171251893044, 0.0001045529279508628, 0.028055237606167793, 0.031835198402404785, 0.014127431437373161, -0.04564820975065231, -0.04990760236978531, -0.010870479047298431, 0.00590931111946702, 0.04774388670921326, 0.005949124693870544, -0.009914560243487358, -0.05552608519792557, -0.01754576340317726, -0.008860033936798573, -0.041030947118997574, -0.05164513364434242, 0.012700805440545082, 0.06362908333539963, -0.02331583760678768, 0.013669340871274471, 0.01628698967397213, 0.28642964363098145, -0.029474947601556778, 0.010040823370218277, -0.04473850503563881, 0.03887912258505821, 0.006894052028656006, 0.022399306297302246, -0.0007777810678817332, -0.015402176417410374, 0.00022744185116607696, 0.002521283458918333, 0.012183410115540028, 0.03795032575726509, -0.039392970502376556, 0.003694776212796569, 0.008819295093417168, 0.011878496035933495, -0.0031687794253230095, 0.014764891937375069, 0.026465527713298798, 0.005468123592436314, -0.014688597992062569, 0.01040888112038374, -0.0008929153555072844, 0.031062860041856766, 0.005883797537535429, 0.0022633729968219995, 0.03287532553076744, -0.021614322438836098, 0.061867572367191315, 0.03122800588607788, -0.014700393192470074, 0.04368419200181961, -0.0021616965532302856, 0.004687662236392498, -0.03353721648454666, -0.018313046544790268, 0.012886499986052513, -0.017576541751623154, 0.013265104033052921, 0.016103144735097885, 0.008023981004953384, -0.04410070553421974, -0.009516937658190727, 0.02651200443506241, -0.04530531167984009, 0.0014347118558362126, 0.000501639093272388, -0.01901644468307495, -0.03256046399474144, -0.06451267749071121, 0.025237565860152245, -0.03076229616999626, -0.010278871282935143, -0.025683104991912842, 0.0009958067676052451, 0.0017538181273266673, -0.04831396043300629, -0.024519668892025948, 0.06815528124570847, -0.024713266640901566, 0.044039588421583176, 0.0026243943721055984, -0.020195700228214264, 0.025076016783714294, 0.01313438918441534, 0.03032136708498001, 0.04385117068886757, -0.018431151285767555, -0.07384984195232391, -0.03507034108042717, -0.05392982065677643, -0.008397692814469337]\n"
     ]
    }
   ],
   "source": [
    "from sentence_transformers import SentenceTransformer\n",
    "\n",
    "embedding_model = SentenceTransformer(\"shibing624/text2vec-base-chinese\")\n",
    "\n",
    "def embed_chunk(chunk: str) -> List[float]:\n",
    "    embedding = embedding_model.encode(chunk, normalize_embeddings=True)\n",
    "    return embedding.tolist()\n",
    "\n",
    "\n",
    "embedding = embed_chunk(\"测试内容\")\n",
    "print(len(embedding))\n",
    "print(embedding)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "87f48192-d9f7-4270-ae08-e5e0300bbb32",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n",
      "[-0.019575245678424835, 0.007184491492807865, 0.0230699609965086, -0.01243650820106268, 0.039207495748996735, -0.05374187231063843, 0.028527164831757545, -0.021042034029960632, -0.0017696103313937783, 0.04136236757040024, -0.0251983180642128, -0.055938079953193665, 0.07257921248674393, 0.021626586094498634, -0.004362869542092085, -0.0002865218557417393, 0.06021151319146156, 0.026215165853500366, -0.04922764375805855, 0.009307719767093658, 0.013933569192886353, -0.005938108544796705, -0.03683418780565262, 0.02330171689391136, 0.010850700549781322, 0.004264346323907375, 0.0037719786632806063, -0.024697497487068176, 0.001359243062324822, 0.05580892413854599, 0.021838387474417686, 0.04607835412025452, -0.06695900857448578, 0.029105717316269875, 0.01936657726764679, -0.021051201969385147, 0.015360496006906033, -0.0030887459870427847, 0.010731671005487442, 0.022035397589206696, 0.03437013179063797, 0.04636264592409134, -0.057696688920259476, -0.05955008417367935, 0.0017392536392435431, 0.055718954652547836, 0.00042800468509085476, 0.04777652770280838, -0.03258340060710907, 0.033878933638334274, -0.055904045701026917, 0.31629103422164917, 0.031006360426545143, -0.024298107251524925, -0.009092324413359165, 0.06546436250209808, 0.02334495075047016, 0.006341013591736555, 0.018706809729337692, 0.02299094758927822, 0.01383393257856369, -0.013355079106986523, -0.016584251075983047, -0.024968000128865242, -0.014794443733990192, -0.013685979880392551, -0.024329233914613724, -0.03959381580352783, -0.04097596928477287, -0.04594763368368149, 0.029489226639270782, 0.03400907665491104, -0.06798189878463745, -0.0006202030926942825, 0.03166961669921875, 0.002544148825109005, 0.03436625748872757, -0.020243555307388306, 0.0021672972943633795, 0.010164218954741955, 0.008175147697329521, -0.06498735398054123, -0.008412368595600128, -0.038191720843315125, -0.005446275230497122, -0.01196873839944601, 0.009688464924693108, -0.019435057416558266, 0.07741902768611908, -0.0471029058098793, -0.01897330768406391, -0.032317835837602615, -0.022417018190026283, 0.02651425078511238, -0.0012653149897232652, -0.012755224481225014, 0.026539402082562447, -0.016730744391679764, -0.032214272767305374, -0.03031826764345169, 0.030648913234472275, -0.008278362452983856, -0.04636301472783089, -0.04409271106123924, -2.8772444693458965e-07, -0.031140772625803947, 0.03442675247788429, 0.00857746135443449, 0.0038429026026278734, 0.03769105672836304, -0.025073133409023285, -0.0488264262676239, 0.049011632800102234, 0.04935596138238907, -0.009123008698225021, 0.05134636536240578, -0.04378075525164604, 0.03778235614299774, -0.005146006587892771, 0.04410054534673691, 0.022870449349284172, 0.025381723418831825, -0.008953483775258064, -0.018898386508226395, -0.004075912293046713, 0.009194892831146717, 0.003678363049402833, -0.010642238892614841, -0.0008922743145376444, 0.0057873050682246685, 0.010112910531461239, -0.009371640160679817, -0.08127480000257492, -0.0002079110563499853, 0.01273820735514164, -0.02297130785882473, -0.04988085851073265, -0.04311223328113556, 0.004414511378854513, -0.03586799278855324, 0.04770403727889061, -0.016997870057821274, 0.04105355963110924, 0.02105812355875969, 0.05482717975974083, -0.018240079283714294, 0.033797506242990494, -0.014233243651688099, 0.012422526255249977, -0.013608125038444996, 0.018477661535143852, 0.01952444389462471, -0.053328532725572586, 0.05681048706173897, 0.003922651521861553, -0.03331359848380089, -0.050931937992572784, 0.0021759732626378536, -0.0723157599568367, 0.024132372811436653, -0.023186879232525826, 0.019143985584378242, -0.008616605773568153, 0.02687547355890274, 0.027584146708250046, 0.0437968410551548, -0.025746481493115425, 0.012599845416843891, 0.09022200107574463, -0.020604398101568222, -0.03563583642244339, -0.006897601764649153, 0.014002720825374126, 0.017634963616728783, 0.02875705622136593, 0.029499614611268044, 0.04638306051492691, -0.02449882961809635, -0.06291420012712479, -0.049077264964580536, -0.021351445466279984, 0.02646530419588089, 0.04578239843249321, -0.005862803664058447, -0.03215060755610466, 0.0030407581944018602, -0.010758412070572376, -0.010290556587278843, 0.023174867033958435, 0.007631390355527401, -0.03857005015015602, -0.005427138414233923, -0.04518408328294754, -0.012234404683113098, 0.009643825702369213, 0.025938337668776512, 0.022814003750681877, -0.03978919982910156, -0.04089175537228584, -0.011615651659667492, -0.03620956093072891, 0.0012964779743924737, -0.020042888820171356, -0.05397961661219597, 0.008687211200594902, 0.03580188751220703, -0.03461933508515358, -0.03228490799665451, 0.021346954628825188, 0.01985393464565277, -0.007873179391026497, 0.0013645747676491737, 0.014947504736483097, 0.01498852577060461, 0.027462556958198547, 6.532626139232889e-05, -0.03034496121108532, 0.002699723932892084, -0.02072504349052906, 0.03037245385348797, -0.04398975148797035, -0.028308289125561714, 0.029070334509015083, 0.0018611660925671458, -0.01114792749285698, 0.036018066108226776, -0.00828554481267929, -0.043436095118522644, -0.0023190160281956196, 0.030167322605848312, -0.004698456730693579, -0.015231151133775711, -0.04264411702752113, -0.022069064900279045, -0.001220513484440744, -0.018037280067801476, -0.04970848187804222, 0.008847872726619244, -0.048723313957452774, -0.03382815048098564, 0.04059728980064392, 0.00846572034060955, -0.02522672526538372, -0.000658822653349489, 0.017670080065727234, 0.01355036348104477, 0.005875402130186558, -0.027556147426366806, 0.002868497045710683, 0.005567203275859356, 0.05288984626531601, 0.0027863294817507267, 0.011704753153026104, -0.054450131952762604, 0.04906482994556427, 0.029993437230587006, -0.0010608627926558256, -0.054688435047864914, 0.021854421123862267, 0.017882581800222397, 0.04986060410737991, -0.08198913931846619, 0.027324451133608818, -0.002917977049946785, 0.032340794801712036, -0.020740211009979248, -0.0005490778130479157, -0.028165724128484726, -0.03474273905158043, 0.0709252804517746, -0.025202788412570953, -0.0014516275841742754, 0.010104898363351822, -0.055702053010463715, -0.002845396287739277, -0.014473482966423035, 0.024009745568037033, -0.029762564226984978, -0.038814254105091095, 0.007417516782879829, -0.005394815467298031, 0.02193508669734001, 0.02445455640554428, 0.00044721539597958326, 0.0007251782226376235, -0.019695673137903214, -0.02186840958893299, 9.935082925949246e-05, -0.03830121085047722, -0.03355370834469795, -0.018670987337827682, -0.0567297525703907, 0.02321363054215908, -0.00461058272048831, 0.06102447584271431, -0.038619477301836014, -0.029414284974336624, -0.023253360763192177, 0.0623246431350708, -0.020949050784111023, 0.008914770558476448, 0.019913863390684128, -0.014146704226732254, 0.0235853660851717, -0.029241418465971947, -0.015371200628578663, 0.07652343064546585, 0.031256020069122314, -0.02261676825582981, 0.03711772710084915, -0.01530705951154232, -0.004552348051220179, 0.02520122192800045, -0.03407280892133713, 0.0007222206331789494, 0.024411393329501152, -0.0017289652023464441, 0.013812338002026081, 0.08538073301315308, 0.02314557135105133, 0.02020253799855709, 0.05879047513008118, 0.04270472005009651, 0.0596114881336689, -0.0450265072286129, -0.0332917645573616, -0.010204403661191463, -0.03146468475461006, -0.011219728738069534, 0.005433768965303898, -0.005610175896435976, -0.013676035217940807, -0.014545651152729988, 0.05427214875817299, -0.009991297498345375, 0.018363304436206818, 0.028053363785147667, 0.004931776784360409, -0.04974066838622093, -0.07254651933908463, 0.0021212000865489244, -0.03046802431344986, -0.030541852116584778, -0.03873365372419357, 0.01498089637607336, -0.02760278806090355, -0.0015530887758359313, -0.026116494089365005, -0.07665716111660004, 0.03843327611684799, 0.029879992827773094, 0.03893926739692688, 0.022966785356402397, -0.006102449726313353, -0.01258455216884613, -0.010450861416757107, 0.03052501007914543, 0.023697979748249054, 0.019688814878463745, 0.019036853685975075, -0.032379504293203354, 0.02658098191022873, -0.00813382025808096, -0.011671945452690125, 0.029618598520755768, 0.009453977458178997, 0.03254873305559158, -0.03417671099305153, -0.01655859872698784, -0.0026682557072490454, 0.02670803666114807, 0.057674720883369446, 0.012620360590517521, -0.020122312009334564, -0.043604157865047455, -0.008301245979964733, -0.027785686776041985, -0.012901553884148598, 0.027029333636164665, 0.036296822130680084, 0.035488881170749664, -0.032955534756183624, 0.021105023100972176, -0.04240453615784645, 0.026610231027007103, -0.02620500884950161, 0.001453590695746243, 0.004402652382850647, -0.014022737741470337, -0.03229152411222458, -0.010603949427604675, -0.01765235885977745, -0.001992778852581978, -0.049847058951854706, -0.0360260084271431, 0.005037260707467794, -0.04908370599150658, 0.013484413735568523, -0.026446633040905, 0.019590048119425774, -0.0242310743778944, -0.009068716317415237, -0.01086487341672182, -0.02524659037590027, -0.00032512383768334985, -0.016430040821433067, 0.019867973402142525, 0.03290142863988876, 0.05426551774144173, -0.07008343189954758, -0.02170533686876297, 0.04489173740148544, -0.004063666798174381, -0.020953137427568436, 0.005982236936688423, -0.03865432366728783, -0.061571188271045685, 0.005439183209091425, 0.02711702324450016, -0.00844859890639782, -0.01644068956375122, -0.025952104479074478, 0.022699229419231415, -0.02252841927111149, 0.06538811326026917, -0.08372665196657181, -0.05089442804455757, -0.03153529763221741, -0.014182078652083874, -0.024516424164175987, 0.05083749070763588, 0.0011483040871098638, -0.011152977123856544, 0.019425589591264725, -0.004413402173668146, -0.011585778556764126, -0.008473851718008518, 0.006571247708052397, 0.005964476149529219, 0.004132492933422327, -0.03928951919078827, 0.016739295795559883, 0.05932611972093582, 0.029311416670680046, 0.03729157894849777, -0.032178059220314026, 0.09054070711135864, 0.003541381796821952, -0.020401066169142723, 0.033569060266017914, 0.0023426536936312914, 0.0011930367909371853, -0.009094730019569397, -0.05840180441737175, 0.07354900240898132, -0.020121946930885315, 0.012296526692807674, -0.0022190846502780914, -0.006940481718629599, 0.05098889395594597, 0.02511655166745186, -0.020064709708094597, 0.0253569595515728, 0.05129018798470497, 0.02122393622994423, -0.0052771237678825855, -0.05134673789143562, 0.016074007377028465, 0.003490179544314742, 0.07571224868297577, 0.060608457773923874, 0.019657278433442116, -0.006239684298634529, -0.021264441311359406, 0.01771719567477703, -0.028178954496979713, -0.05186501517891884, -0.03138470649719238, -0.02705828659236431, -0.10381457209587097, 0.030338741838932037, -0.06478020548820496, -0.0609833225607872, 0.021071558818221092, 0.02529301308095455, 0.06755193322896957, -0.03659011051058769, -0.03200326859951019, 0.021191291511058807, 0.012678796425461769, -0.02029627561569214, -0.006004428956657648, 0.04356565326452255, 0.00836500059813261, -0.031111784279346466, -0.14221757650375366, 0.01328841969370842, -0.029158761724829674, -0.03101990930736065, 0.014649796299636364, 0.022822462022304535, -0.0026551797054708004, 0.04423389956355095, 0.040543314069509506, -0.014595601707696915, 0.011473297141492367, -0.09130798280239105, 0.005705502349883318, 0.007437996566295624, 0.0034434664994478226, -0.012914009392261505, -0.028172073885798454, 0.041753433644771576, -0.026228172704577446, 0.05791380628943443, 0.011094313114881516, -0.020072562620043755, 0.01836261712014675, 0.039502501487731934, 0.014905666001141071, 0.05232274904847145, -0.013581609353423119, -0.026190588250756264, 0.02168404497206211, -0.048471659421920776, 0.017087092623114586, -0.03627796471118927, 0.010164468549191952, -0.05211496725678444, 0.029964536428451538, 0.01733250729739666, 0.05204572528600693, -0.07522932440042496, -0.040387608110904694, 0.024310916662216187, 0.044961269944906235, 0.053168393671512604, 0.0563988983631134, 0.04699467867612839, -0.01425087545067072, -0.01962447538971901, 0.0009377664537169039, 0.03109448216855526, -0.017893271520733833, -0.03482420742511749, -0.00553397461771965, 0.01584434136748314, -0.01657920330762863, -0.01025377307087183, -0.004756081383675337, -0.009247601963579655, -0.030618904158473015, 0.008117277175188065, 0.019564911723136902, 0.019775429740548134, -0.0282338447868824, -0.07369294762611389, 0.002558677224442363, -0.009378043003380299, -0.0032988188322633505, -0.002708825981244445, -0.05723098665475845, 0.010283329524099827, -0.04216751456260681, -0.03646428883075714, -0.016274282708764076, 0.020370280370116234, -0.032444506883621216, -0.051685333251953125, -0.0017739745089784265, 0.02701401337981224, -0.018742769956588745, 0.0422854982316494, 0.08014562726020813, -0.024416904896497726, 0.017382947728037834, -0.011250962503254414, -0.039670802652835846, 0.00026210586656816304, -0.03529917821288109, 0.03928028792142868, 0.014210451394319534, 0.03159512206912041, -0.04646536707878113, 0.020431041717529297, 0.0037060948088765144, 0.0017987617757171392, -0.0563591793179512, -0.0013847778318449855, 0.014804890379309654, -0.017672276124358177, 0.00022005234495736659, 0.009399843402206898, 0.05996820330619812, -0.014980463311076164, 0.015514206141233444, -0.05771438032388687, 0.00987581443041563, 0.029005266726017, 0.003136218525469303, 0.0012562021147459745, 0.10040707886219025, -0.026372293010354042, -0.04681917652487755, 0.029139017686247826, -0.00747465156018734, 0.04505523294210434, 0.00872877798974514, -0.011935601010918617, -0.07580943405628204, -0.03477560728788376, 0.008312718942761421, -0.05544694885611534, 0.04934819042682648, -0.007774960249662399, 0.0437404066324234, -0.0200048740953207, 0.0344821959733963, 0.010883690789341927, -0.06000569462776184, -0.01871415786445141, -0.03956103324890137, -0.014007221907377243, -0.011436213739216328, 0.00879357848316431, -0.004188906401395798, -0.02302643656730652, 0.0007935506873764098, 0.006244358140975237, 0.029402926564216614, -0.03749294579029083, 0.002670776564627886, 0.06962329149246216, -0.048934899270534515, 0.00912968534976244, 0.03313244879245758, -0.007266637869179249, -0.012852273881435394, 0.003959826659411192, 0.019211051985621452, 0.0336722768843174, 0.024616379290819168, -0.027333512902259827, -0.04864327609539032, 0.07117066532373428, 0.006693967618048191, 0.04965031519532204, 0.004121762700378895, -0.01404201053082943, 0.004240276757627726, -0.0017320234328508377, 0.0068808989599347115, 0.05296016111969948, 0.012016470544040203, 0.008241631090641022, -0.05940234288573265, 0.07657935470342636, -0.03656349331140518, -0.052963737398386, -0.04976905509829521, 0.02750285156071186, -0.007105721160769463, -0.03657199442386627, -0.03655023127794266, -0.0018085071351379156, -0.07224300503730774, 0.009658225812017918, -0.011073477566242218, 0.008183677680790424, 0.011940588243305683, 0.11663933843374252, 0.07610754668712616, -0.006856619380414486, -0.003499114653095603, -0.019065378233790398, 0.030794864520430565, 0.009789181873202324, 0.016759183257818222, 0.02532178722321987, 0.002362719038501382, -0.025858167558908463, 0.01238317135721445, -0.0023598382249474525, -0.018748808652162552, -0.015227556228637695, -0.0225530955940485, -0.040607284754514694, -0.02734004706144333, -0.011350806802511215, -0.004592766519635916, -0.036703258752822876, 0.04418798163533211, -0.003862511832267046, -0.03565922752022743, 0.03560401871800423, 0.12003621459007263, -0.049474798142910004, 0.013182581402361393, 0.004166050814092159, 0.039744794368743896, 0.023967018350958824, -0.014209223911166191, 0.04398909956216812, -0.06685987114906311, 0.0017546801827847958, 0.022533733397722244, -0.050653111189603806, -0.09923341870307922, -0.00455934414640069, -0.0017454950138926506, -0.038001276552677155, 0.04672863334417343, -0.013196361251175404, 0.04957878589630127, 0.017210129648447037, -0.04172074794769287, 0.01158106792718172, 0.06164977326989174, 0.054958276450634, 0.02367612160742283, 0.022868258878588676, -0.026328960433602333, 0.050871070474386215, 0.007852423004806042, 0.04937553033232689, -0.04756603389978409, 0.034248221665620804, 0.05096571892499924, 0.035475172102451324, 0.0010320090223103762, -0.01896739937365055, 0.03736281022429466, 0.012608665972948074, -0.039889540523290634, -0.05007623881101608, 0.04336465522646904, -0.02073860354721546, 0.08724326640367508, -0.019608423113822937, -0.00797133706510067, 0.00957647617906332, -0.018797820433974266, 0.007389036938548088, -0.010044367052614689, -0.0030236777383834124, 0.012985949404537678, 0.013595478609204292, 0.036131955683231354, -0.03293904662132263, 0.004596341866999865, 0.0211248230189085, -0.03887394815683365, -0.0007361882016994059, 0.0021365273278206587, 0.016914186999201775, -0.048614129424095154, 0.08700255304574966, -0.029589170590043068, 0.06149931997060776, -0.013487757183611393, -0.003979105968028307, 0.020023049786686897, 0.09603303670883179, 0.02000705525279045, -0.019597278907895088, -0.00282137468457222, -0.05546524003148079, -0.0405719131231308]\n"
     ]
    }
   ],
   "source": [
    "embeddings = [embed_chunk(chunk) for chunk in chunks]\n",
    "\n",
    "print(len(embeddings))\n",
    "print(embeddings[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "babfbd91-76fc-4467-9ff7-ccaf5ffbbd54",
   "metadata": {},
   "outputs": [],
   "source": [
    "import chromadb\n",
    "\n",
    "chromadb_client = chromadb.EphemeralClient()\n",
    "chromadb_collection = chromadb_client.get_or_create_collection(name=\"default\")\n",
    "\n",
    "def save_embeddings(chunks: List[str], embeddings: List[List[float]]) -> None:\n",
    "    for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):\n",
    "        chromadb_collection.add(\n",
    "            documents=[chunk],\n",
    "            embeddings=[embedding],\n",
    "            ids=[str(i)]\n",
    "        )\n",
    "\n",
    "save_embeddings(chunks, embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9e47b06d-3f7a-40bd-886a-aca6c7e19f0b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0] # 哆啦A梦与超级赛亚人：时空之战\n",
      "\n",
      "[1] 三件秘密道具分别是：可以临时赋予超级战力的“复制斗篷”，能暂停时间五秒的“时间停止手表”，以及可在一分钟中完成一年修行的“精神与时光屋便携版”。大雄被推进精神屋内，在其中接受密集的训练，虽然只有几分钟现实时间，他却经历了整整一年的苦修。刚开始他依旧软弱，想放弃、想逃跑，但当他想起静香、父母，还有哆啦A梦那坚定的眼神时，他终于咬牙坚持了下来。出来之后，他的身体与精神都焕然一新，眼神中多了一份成熟与自信。\n",
      "\n",
      "[2] 最终战在黑暗赛亚人的空中要塞前爆发，特兰克斯率先出击，释放全力与敌人正面对决。哆啦A梦则用任意门和道具支援，从各个方向制造混乱，尽量压制敌人的时空能力。但黑暗赛亚人太过强大，仅凭特兰克斯一人根本无法压制，更别说击败。就在特兰克斯即将被击倒之际，大雄披上复制斗篷、冲破恐惧从高空跃下。他的拳头燃烧着金色光焰，目标直指敌人心脏。\n",
      "\n",
      "[3] 战后，未来世界开始恢复，植物重新生长，人类重建家园。特兰克斯告别时紧紧握住大雄的手，说：“你是我见过最特别的战士。”哆啦A梦也为大雄感到骄傲，说他终于真正成长了一次。三人站在山丘上，看着远方重新明亮的地平线，心中感受到从未有过的安宁。随后，哆啦A梦与大雄乘坐时光机返回了属于他们的那个年代，一切仿佛又恢复平静。\n",
      "\n",
      "[4] 哆啦A梦与大雄听后大惊，但也从特兰克斯坚定的眼神中读出了不容拒绝的决心。特兰克斯解释说，未来的敌人并非普通反派，而是一个名叫“黑暗赛亚人”的存在，他由邪恶科学家复制了贝吉塔的基因并加以改造，实力超乎想象。这个敌人不仅拥有赛亚人战斗力，还能操纵扭曲的时间能量，几乎无人可敌。特兰克斯已经独自战斗多年，但每一次都以惨败告终。他说：“科技，是我那个时代唯一缺失的武器，而你们，正好拥有它。”\n",
      "\n"
     ]
    }
   ],
   "source": [
    "def retrieve(query: str, top_k: int) -> List[str]:\n",
    "    query_embedding = embed_chunk(query)\n",
    "    results = chromadb_collection.query(\n",
    "        query_embeddings=[query_embedding],\n",
    "        n_results=top_k\n",
    "    )\n",
    "    return results['documents'][0]\n",
    "\n",
    "query = \"哆啦A梦使用的3个秘密道具分别是什么？\"\n",
    "retrieved_chunks = retrieve(query, 5)\n",
    "\n",
    "for i, chunk in enumerate(retrieved_chunks):\n",
    "    print(f\"[{i}] {chunk}\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e57ac85d-d634-4c1d-93fa-e627cf09a6f1",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/apple/Downloads/developments/mcp_demo_001/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py:1762: FutureWarning: `encoder_attention_mask` is deprecated and will be removed in version 4.55.0 for `XLMRobertaSdpaSelfAttention.forward`.\n",
      "  return forward_call(*args, **kwargs)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0] 三件秘密道具分别是：可以临时赋予超级战力的“复制斗篷”，能暂停时间五秒的“时间停止手表”，以及可在一分钟中完成一年修行的“精神与时光屋便携版”。大雄被推进精神屋内，在其中接受密集的训练，虽然只有几分钟现实时间，他却经历了整整一年的苦修。刚开始他依旧软弱，想放弃、想逃跑，但当他想起静香、父母，还有哆啦A梦那坚定的眼神时，他终于咬牙坚持了下来。出来之后，他的身体与精神都焕然一新，眼神中多了一份成熟与自信。\n",
      "\n",
      "[1] 最终战在黑暗赛亚人的空中要塞前爆发，特兰克斯率先出击，释放全力与敌人正面对决。哆啦A梦则用任意门和道具支援，从各个方向制造混乱，尽量压制敌人的时空能力。但黑暗赛亚人太过强大，仅凭特兰克斯一人根本无法压制，更别说击败。就在特兰克斯即将被击倒之际，大雄披上复制斗篷、冲破恐惧从高空跃下。他的拳头燃烧着金色光焰，目标直指敌人心脏。\n",
      "\n",
      "[2] 战后，未来世界开始恢复，植物重新生长，人类重建家园。特兰克斯告别时紧紧握住大雄的手，说：“你是我见过最特别的战士。”哆啦A梦也为大雄感到骄傲，说他终于真正成长了一次。三人站在山丘上，看着远方重新明亮的地平线，心中感受到从未有过的安宁。随后，哆啦A梦与大雄乘坐时光机返回了属于他们的那个年代，一切仿佛又恢复平静。\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from sentence_transformers import CrossEncoder\n",
    "\n",
    "def rerank(query: str, retrieved_chunks: List[str], top_k: int) -> List[str]:\n",
    "    cross_encoder = CrossEncoder('cross-encoder/mmarco-mMiniLMv2-L12-H384-v1')\n",
    "    pairs = [(query, chunk) for chunk in retrieved_chunks]\n",
    "    scores = cross_encoder.predict(pairs)\n",
    "\n",
    "    scored_chunks = list(zip(retrieved_chunks, scores))\n",
    "    scored_chunks.sort(key=lambda x: x[1], reverse=True)\n",
    "\n",
    "    return [chunk for chunk, _ in scored_chunks][:top_k]\n",
    "\n",
    "reranked_chunks = rerank(query, retrieved_chunks, 3)\n",
    "\n",
    "for i, chunk in enumerate(reranked_chunks):\n",
    "    print(f\"[{i}] {chunk}\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "79d844d8-846e-4a88-a19f-c8e282839b99",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "你是一位知识助手，请根据用户的问题和下列片段生成准确的回答。\n",
      "\n",
      "用户问题: 哆啦A梦使用的3个秘密道具分别是什么？\n",
      "\n",
      "相关片段:\n",
      "三件秘密道具分别是：可以临时赋予超级战力的“复制斗篷”，能暂停时间五秒的“时间停止手表”，以及可在一分钟中完成一年修行的“精神与时光屋便携版”。大雄被推进精神屋内，在其中接受密集的训练，虽然只有几分钟现实时间，他却经历了整整一年的苦修。刚开始他依旧软弱，想放弃、想逃跑，但当他想起静香、父母，还有哆啦A梦那坚定的眼神时，他终于咬牙坚持了下来。出来之后，他的身体与精神都焕然一新，眼神中多了一份成熟与自信。\n",
      "\n",
      "最终战在黑暗赛亚人的空中要塞前爆发，特兰克斯率先出击，释放全力与敌人正面对决。哆啦A梦则用任意门和道具支援，从各个方向制造混乱，尽量压制敌人的时空能力。但黑暗赛亚人太过强大，仅凭特兰克斯一人根本无法压制，更别说击败。就在特兰克斯即将被击倒之际，大雄披上复制斗篷、冲破恐惧从高空跃下。他的拳头燃烧着金色光焰，目标直指敌人心脏。\n",
      "\n",
      "战后，未来世界开始恢复，植物重新生长，人类重建家园。特兰克斯告别时紧紧握住大雄的手，说：“你是我见过最特别的战士。”哆啦A梦也为大雄感到骄傲，说他终于真正成长了一次。三人站在山丘上，看着远方重新明亮的地平线，心中感受到从未有过的安宁。随后，哆啦A梦与大雄乘坐时光机返回了属于他们的那个年代，一切仿佛又恢复平静。\n",
      "\n",
      "请基于上述内容作答，不要编造信息。仅仅回答问题，不需要展开论证，也不需要回答其他额外信息\n",
      "\n",
      "---\n",
      "\n",
      "------------------------------------------------------------\n",
      "哆啦A梦使用的3个秘密道具分别是：复制斗篷、时间停止手表、精神与时光屋便携版。\n"
     ]
    }
   ],
   "source": [
    "from dotenv import load_dotenv\n",
    "from google import genai\n",
    "\n",
    "load_dotenv()\n",
    "google_client = genai.Client()\n",
    "\n",
    "def generate(query: str, chunks: List[str]) -> str:\n",
    "    prompt = f\"\"\"你是一位知识助手，请根据用户的问题和下列片段生成准确的回答。\n",
    "\n",
    "用户问题: {query}\n",
    "\n",
    "相关片段:\n",
    "{\"\\n\\n\".join(chunks)}\n",
    "\n",
    "请基于上述内容作答，不要编造信息。仅仅回答问题，不需要展开论证，也不需要回答其他额外信息\"\"\"\n",
    "\n",
    "    print(f\"{prompt}\\n\\n---\\n\")\n",
    "\n",
    "    response = google_client.models.generate_content(\n",
    "        model=\"gemini-2.5-flash\",\n",
    "        contents=prompt\n",
    "    )\n",
    "\n",
    "    return response.text\n",
    "\n",
    "answer = generate(query, reranked_chunks)\n",
    "print('--'*30)\n",
    "print(answer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "113a6466-3d80-4c87-9d25-a9aad061b310",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
